NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Disaggregated GPU Acceleration for Serverless Applications

https://doi.org/10.1145/3606557.3606560

Fingler, Henrique; Zhu, Zhiting; Yoon, Esther; Jia, Zhipeng; Witchel, Emmett; Rossbach, Christopher J. (June 2023, ACM SIGOPS Operating Systems Review)

Serverless platforms have been attracting applications from traditional platforms because infrastructure management responsibilities are shifted from users to providers. Many applications well-suited to serverless environments could leverage GPU acceleration to enhance their performance. Unfortunately, current serverless platforms do not expose GPUs to serverless applications.
more » « less
Full Text Available
Towards a Machine Learning-Assisted Kernel with LAKE

https://doi.org/10.1145/3575693.3575697

Fingler, Henrique; Tarte, Isha; Yu, Hangchen; Szekely, Ariel; Hu, Bodun; Akella, Aditya; Rossbach, Christopher J. (January 2023, ASPLOS 2023)

Full Text Available
DGSF: Disaggregated GPUs for Serverless Functions

https://doi.org/10.1109/IPDPS53621.2022.00077

Fingler, Henrique; Zhu, Zhiting; Yoon, Esther; Jia, Zhipeng; Witchel, Emmett (April 2022, IEEE International Parallel and Distributed Processing Symposium)

Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF’s API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.
more » « less
Full Text Available
Parla: a Python orchestration system for heterogeneous architectures

https://doi.org/10.1109/SC41404.2022.00056

Lee, Hochan; Ruys, William; Henriksen, Ian; Peters, Arthur; Yan, Yineng; Stephens, Sean; You, Bozhi; Fingler, Henrique; Burtscher, Martin; Gligoric, Milos; et al (November 2022, SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis)

Python's ease of use and rich collection of numeric libraries make it an excellent choice for rapidly developing scientific applications. However, composing these libraries to take advantage of complex heterogeneous nodes is still difficult. To simplify writing multi-device code, we created Parla, a heterogeneous task-based programming framework that fully supports Python's scientific programming stack. Parla's API is based on Python decorators and allows users to wrap code in Parla tasks for parallel execution. Parla arrays enable automatic movement of data between devices. The Parla runtime handles resource-aware mapping, scheduling, and execution of tasks. Compared to other Python tasking systems, Parla is unique in its parallelization of tasks within a single process, its GPU context and resource-aware runtime, and its design around gradual adoption to provide easy migration of and integration into existing Python applications. We show that Parla can achieve performance competitive with hand-optimized code while improving ease of development.
more » « less
Full Text Available
Strata: A Cross Media File System

https://doi.org/10.1145/3132747.3132770

Kwon, Youngjin; Fingler, Henrique; Hunt, Tyler; Peter, Simon; Witchel, Emmett; Anderson, Thomas (October 2017, Proceedings of the 26th Symposium on Operating Systems Principles)

Current hardware and application storage trends put immense pressure on the operating system's storage subsystem. On the hardware side, the market for storage devices has diversified to a multi-layer storage topology spanning multiple orders of magnitude in cost and performance. Above the file system, applications increasingly need to process small, random IO on vast data sets with low latency, high throughput, and simple crash consistency. File systems designed for a single storage layer cannot support all of these demands together. We present Strata, a cross-media file system that leverages the strengths of one storage media to compensate for weaknesses of another. In doing so, Strata provides performance, capacity, and a simple, synchronous IO model all at once, while having a simpler design than that of file systems constrained by a single storage device. At its heart, Strata uses a log-structured approach with a novel split of responsibilities among user mode, kernel, and storage layers that separates the concerns of scalable, high-performance persistence from storage layer management. We quantify the performance benefits of Strata using a 3-layer storage hierarchy of emulated NVM, a flash-based SSD, and a high-density HDD. Strata has 20-30% better latency and throughput, across several unmodified applications, compared to file systems purpose-built for each layer, while providing synchronous and unified access to the entire storage hierarchy. Finally, Strata achieves up to 2.8x better throughput than a block-based 2-layer cache provided by Linux's logical volume manager.
more » « less
Full Text Available

Search for: All records